Skip to content

feat(visual): reserve ViT worst-case activation memory#1378

Open
sufubao wants to merge 1 commit into
ModelTC:mainfrom
sufubao:vit-worst-case-mem-reserve
Open

feat(visual): reserve ViT worst-case activation memory#1378
sufubao wants to merge 1 commit into
ModelTC:mainfrom
sufubao:vit-worst-case-mem-reserve

Conversation

@sufubao

@sufubao sufubao commented Jul 2, 2026

Copy link
Copy Markdown
Collaborator

Summary

  • reserve and hold peak ViT activation memory during visual worker startup so co-located LLM KV-pool sizing sees the reduced available GPU memory
  • add worst-case builders for InternVL/Qwen VL visual towers plus a manual --visual_reserved_mem_gb override
  • publish visual reservations through shared memory and include the value in max-length diagnostics

Tests

  • python -m py_compile lightllm/common/basemodel/basemodel.py lightllm/models/qwen2_5_vl/qwen2_5_visual.py lightllm/models/qwen2_vl/qwen2_visual.py lightllm/models/qwen3_vl/qwen3_visual.py lightllm/models/vit/model.py lightllm/server/api_cli.py lightllm/server/visualserver/model_infer/__init__.py lightllm/server/visualserver/model_infer/mem_reserve.py lightllm/server/visualserver/model_infer/model_rpc.py lightllm/server/visualserver/model_infer/worst_case_reserve.py
  • pure-function check for compute_qwen_worst_case_grid upper-bound rounding
  • CLI parse check for --visual_reserved_mem_gb

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a mechanism to estimate and reserve worst-case activation memory for co-located Vision Transformers (ViT) to prevent out-of-memory (OOM) errors during runtime. It implements shared memory-based communication between the visual worker and the LLM router to report reserved memory, integrates worst-case memory reservation mixins for Qwen-VL and InternVL models, and provides a manual override command-line argument (--visual_reserved_mem_gb). Feedback on the changes highlights a potential race condition during concurrent startup where the LLM worker might attempt to read the shared memory before the visual worker has initialized it, which could cause a startup crash. It is recommended to wrap this lookup in a try-except block to ensure robustness.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

# assumes global_rank == index into visual_gpu_ids (matching how visual ranks call publish_vit_reserved_mem)
for global_rank, dev in enumerate(gpu_ids):
if dev == device_id:
total += int(SharedInt(get_vit_reserved_shm_name(dev, global_rank)).get_value())

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

During concurrent startup of the LLM worker and the visual worker, the LLM worker may call read_vit_reserved_mem_for_device before the visual worker has initialized and published its reserved memory via publish_vit_reserved_mem. In this case, SharedInt will raise an exception (such as FileNotFoundError or ValueError) because the shared memory segment does not exist yet, causing the LLM worker to crash during startup. Wrapping the lookup in a try...except block ensures robust defensive programming and prevents startup crashes.

Suggested change
total += int(SharedInt(get_vit_reserved_shm_name(dev, global_rank)).get_value())
try:
total += int(SharedInt(get_vit_reserved_shm_name(dev, global_rank)).get_value())
except Exception:
pass

Reserve peak ViT activation memory during visual worker startup so the co-located LLM router sizes its KV pool after the visual tower has already reached its worst-case allocator high-water mark.

Add a manual --visual_reserved_mem_gb override for unsupported visual models and include the reserved amount in max-length diagnostics.
@sufubao sufubao force-pushed the vit-worst-case-mem-reserve branch from 870e44a to a3c5035 Compare July 2, 2026 06:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant